|
In bioinformatics, a DNA read error occurs when a sequence assembler changes one DNA base for a different base. The reads from the sequence assembler is then used to create a de Bruijn graph which is used in various ways to find the read errors. == Overview == From the way a de Bruijn graph is formed, we can see that there is a possibility of 4^k different nodes to make arrangements of a genome. The number of nodes used to create the graph can be reduced in number by considering only the k-mers found within the DNA strand of interest. Given sequence 1, we can determine the nodes of size 7, or 7-mers, that will be in the graph. These 7-mers then create the graph shown in figure 1. The graph shown in figure 1 is a very simple version of what a graph could look like...〔 ''De Bruijn Graph of a small sequence''. (2011). Retrieved Feb 7, 2015, from Homolog.us — Bioinformatics: http://www.homolog.us/Tutorials/index.php?p=2.1&s=1〕 This graph is formed by taking the last 6 elements of the 7-mer and linking it to the node whose first 6 elements are the same. Figure 1 is the most simplistic a de Bruijn graph can be, since each node has exactly one path into it and one path out. Most of the time, you will most likely see a graph where there is more than one edge directed to a node and/or more than one edge leaving a node. This happens due to the way nodes are connected. The nodes are connected by edges pointing to nodes if, and only if, the last ''k-1'' elements of the ''k''-mer you are looking at matches the first ''k-1'' elements of any node. This allows for a multiple-edged de Bruijn graph to form. These more complicated graphs happen due to either read errors or variations in DNA strands. Both causes make it difficult to determine the correct structure of the DNA, and what is causing the differences. Since most DNA strands will likely include read errors and variations, scientists hope to use an assembly process that can merge nodes of the graph when they are unambiguously connected after the graph has been cleaned of vertices and edges created by the errors.〔 Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J., & Birol, I. (2009). ABySS: a parallel assembler for short read sequence data. ''Genome research, 19''(6), 1117-1123〕 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「DNA read errors」の詳細全文を読む スポンサード リンク
|